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theory  is  extended  to  the  case  when  the  dispersion  matrix  of  the  observable 
random  vector  is  singular.  In  the  second,  robustness  of  inference  pro¬ 
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1.  Introduction 


We  consider  the  general  Gauss-Markof f  model 


Dlstribut i £n/_ 


Availability  Codas 
jAvaii  and/or 


Special 


Y  -  XB  +  e  (1.1) 

2 

where  E(e)  »0,  D(e)  ■  o  V,  and  the  matrices  X  and  V  may  be  singular,  and 

2 

discuss  problems  of  inference  on  the  unknown  parameters  8  and  a  .  We  refer 

2 

to  the  model  (1.1)  by  the  triplet  (Y,X8,o  V).  The  paper  is  in  three  parts. 
In  the  first  part,  the  Gauss-Markof f  theory  is  extended  to  the  case  when  V 
is  singular.  In  the  second,  robustness  of  inference  procedures  for  depar¬ 
tures  in  the  design  matrix  X,  the  dispersion  matrix  V  and  distributional 
assumptions  on  Y  is  considered.  The  third  part  introduces  the  concepts 
of  linear  sufficiency  and  completeness  in  linear  models,  without  making  any 
distributional  assumptions. 

The  following  notations  are  used  throughout  the  paper. 

(i)  o (A)  denotes  the  rank  of  a  matrix  A  and  R(A) ,  the  range  of  A,  i.e., 

the  vector  space  generated  by  the  columns  of  A. 

(11)  A  denotes  a  generalized  inverse  of  A,  satisfying  the  only  condition 

AA‘A  -  A  (see  Rao,  1973,  p.  34).  AIR  FOR '*5  OFFI.’i  *F  J'lENTIFI"  T?’"  *  "*1  '  * 

noti.t.  .... 

This  *  '  \ 

opprc  • 

Dial:  •  •’ 

MATTr.~A  ;  .  -i  ,  , 

Cnlef  T '  t  if  *  rant  ion  Division 


(iii)  Z  denotes  a  matrix  of  full  rank,  satisfying  the  condition  Z'X  *  0, 


3 


where  X  is  the  design  matrix. 

(iv)  The  projection  operators  on  R(A)  are  denoted  by  (see  Rao  1973,  p.  48) 

P...  “  A(A'MA)~A'M  where  M  is  p.d.  (positive  definite) 

AM  v 

P.  -  A(A'A)~A'. 

A 

(v)  E  denotes  the  expectation  operator  and  D  the  dispersion  operator 
(providing  the  variance-covariance  matrix  of  a  vector  variable) . 

(vi)  For  any  matrix  L,  ker  L'  consists  of  all  vectors  a  with  L'a  *  0. 

(vii)  Y:  n*l;  X:  nxm  with  p(X)  ■  r<m,  8;  m*l. 

2.  Unified  Approach  to  Linear  Estimation 

In  this  section,  we  consider  some  methods  of  estimating  the  unknown 
2 

parameters  6  and  a  in  the  general  model  (1.1). 

2.1  Inverse  partitioned  matrix  approach. 

Let 


V  X 
X'  0 


for  any  g-inverse.  Then  the  following  proposition  is  proved  in  Rao  (1971). 
Proposition  2.1 

(i)  In  the  class  of  linear  estimators  L'Y  such  that  X'L»p,  the 
minimum  variance  linear  unbiased  estimator  (MVLUE)  of  p'6  is  p'8 
where 


8  -  C3Y  or  CjY. 


B 

f- 

k  .• 
!:■ 

I 


(ii)  If  p'8  and  q’B  are  MVLUE 's  of  p'6  and  q'S  respectively,  then 

Var  (p'8)  -  o2p'c4p 

Cov  (p'8,  q'S)  -  a2p'C4q  *  o2q'C4p. 


3 


2 

(ill)  An  unbiased  estimator  of  a  is 

a2  -  f'Vc^Y,  f  -  p(V:X)  -  p(X)  . 

2.2  Unified  theory  of  least  squares 

When  V  is  nonsingular  and  Y  has  multivariate  normal  distribution,  we 

have  the  following  well  known  results. 

(1)  Let  8  be  such  that 

min  (Y-X8) 'V_1(Y-X6)  -  (Y-X8) ,V*1(Y-X8) . 

8 

Then  the  MVLUE  of  p'B,  peR(X’),  is  p'8. 

(2)  R2  -  (Y-X8)’V-1(Y-X8)  ~  o2x2(f) 

2 

i.e. »  distributed  as  x  on  f.d.f.,  where  f  »  p(V:X)  -  p(X). 

(3)  Let  K'8  ■  w  be  a  linear  hypothesis  where  R(K)  cR(X')  and  p(K)  •  h, 

and 


r3  -  min  (Y-X6) 'V_1(Y-X8) . 
K'8*<d 


Then 


2  2  2  2 
*1  -  Rq  ~  °  X  (h). 

If  V  is  singular,  the  above  statements  are  not  applicable  and  the 
following  question  arises.  Does  there  exist  a  symmetric  matrix  M  which 
takes  the  place  of  V  *  for  which  the  above  properties  ( 1) — ( 3)  hold?  The 
answer  is  contained  in  Proposition  2.2  proved  in  Rao  (1973). 

Proposition  2.2  Let  M  »  (V  +  XUX')~  for  any  symmetric  g-inverse  and  U  be  any 
symmetric  matrix  such  that  p(V;X)  *  p(.V+XUX'). 

(i)  If  8  is  such  that 


min  (Y-X8) 'M(Y-XB)  -  (Y-X8) ’M(Y-X8) 

8 

then  the  MVLUE  of  p'8,  p  e  R(X') ,  is  p’8. 

(ii)  R2  -  (Y-X8)'M(Y-X8)  ~  02x2(f) ,  f  -  o(V:X)  -  p(x). 


(iii)  There  is  no  choice  of  M  for  which  the  property  (3)  also  holds 
for  all  testable  hypotheses. 

Contrary  to  what  is  stated  in  (iii),  claims  have  been  made  about  the  existence 
of  M  for  which  the  property  (3)  also  holds.  This  is  shown  to  be  not  true  in 
Rao  (1978). 

2.3  Least  squares  theory  with  derived  restrictions 

If  V  is  nxn  and  singular,  then  there  exists  a  matrix  N  of  rank  s*n-p(V) 
such  that  N’V  -  0  which  implies  that 

N'Y  -  N'XB  -  0  w.p.l.  (2.3.1) 

This  stochastic  relationship  may  be  considered  as  a  restriction  on  the  para¬ 
meter  8,  which  is  known  when  Y  is  observed.  In  such  a  case,  the  following 
proposition  is  proved  by  Goldman  and  Zelen  (1964)  and  Mitra  and  Rao  (1968) . 
Proposition  2.3  Let  V  be  any  g-inverse  of  V  and  6  be  such  that 

min  (Y-XB) 'V_(Y-XB)  -  (Y-XB)V~(Y-XB)  -  R~. 

N'Y-N’XB 

Then 

(i)  p'B  is  the  MVLUE  of  p'B,  pcR(X'). 

(ii)  Rq  ~  a2x2(f),  f  -  p (V:X)  -  p(X) 

(iii)  If 

R2  -  min  (Y-XB) V~(Y-XB) 

1  N'Y-N'XB 

2  2  22  K'B*w 

then  R^  -  Rq  ~  a  x  (h) ,  where  h  is  the  degrees  of  freedom  of  the 
hypothesis  K'8  ■  w  to  be  tested.  (Note  that  h  is  the  rank  of  the 
variance  covariance  matrix  of  K'8  and  not  necessarily ■ the  rank  of  K.) 


2.4  Optimal  estimators  in  a  wider  class 

In  Sections  2.1  and  2.2,  we  considered  the  class  of  linear  functions 
of  y  as  estimators  ot  p'8,  pe  R(X').  Now  we  consider  a  wider  class  of 
functions 


T(Y)  =■  f(N’Y)  +  Y’g(N’Y), 


(2.4.1) 


where  N  is  as  defined  in  (2.3.1),  f  is  a  scalar  and  g  is  a  vector  function, 

as  possible  estimators  of  p'8.  The  following  proposition  is  proved  in  Rao 

» 

(1979). 

Proposition  2.4 

(i)  p'8  has  an  unbiased  estimator  in  the  class  (2.4.1)  iff  peR(X'). 

(ii)  If  p'8  is  unblasedly  estimable,  then  the  MVLUE  of  p'8  in  the 
class  of  (2.4.1)  is  equivalent  w.p.  1  to  the  MVLUE  of  p'8  in  the 
class  of  linear  functions  L*Y,  as  considered  in  Sections  2.1  and  2.2. 

(iii)  If  L^Y  is  the  MVLUE  of  p'B  in  the  class  L'Y,  then  a  general 
representation  of  the  MVLUE  in  the  wider  class  (2.4.1)  is 


L;Y  +  f (N'Y)  +  Y'g(N'Y) 

where  the  functions  f  and  g  are  such  that  they  can  be  expressed  in 
terms  of  a  function  h  as 


f(5)  -  -  5'h(0 
g(0  -  N  h(c) 

for  all  5  £  R(N'X)  and  arbitrary  outside  R(N'X).  Similar  approach  was 
given  in  a  paper  published  later  by  Harville  (1981). 


2.5  Generalized  projection  operator 

2 

Consider  the  general  linear  model  (Y,  XB,  o  V),  where  V  may  be  singular. 
It  is  easily  seen  that 

Y  £  R(V:X)  w.p.l. 

The  following  proposition  is  established  in  Rao  (1974). 

Proposition  2.5  Let  Z  be  a  matrix  of  full  rank  such  that  Z'X  *  0.  Then: 

(i)  R(X)  and  R(VZ)  are  disjoint,  and  R(X:VZ)  -  R(X:V). 

(ii)  The  projection  of  y e  R(X:VZ)  on  R(X)  along  R(VZ)  can  be  expressed 
as  Py  where  P  is  any  matrix  satisfying  the  conditions 

PX  »  X,  PVZ  -  0. 

[Such  a  matrix  P  is  called  a  generalized  projection  operator  which 
reduces  to  the  usual  projection  operator  when  p(X:VZ)  *  n,  where  n 
is  the  order  of  V.  Note  that  P  is  not  unique  when  p(X:VZ)  <  n] . 

From  the  above  proposition  ve  deduce: 

Proposition  2.6  Let  P  be  the  projection  operator  on  R(X)  along  R(VZ)  as 
defined  in  Proposition  2.5,  and  CY  be  an  unbiased  estimator  of  XB  (i.e., 
CX-X).  Then 

D(CY)  -  D(PY) 

is  non-negative  definite,  where  D  denotes  the  dispersion  (variance-covariance) 
operator,  so  that  PY  is  the  minimum  dispersion  unbiased  estimator  of  XB  in 


the  class  of  linear  unbiased  estimators. 


D(CY)  -  D(CY-PY+PY) 

■  D(CY-PY)  +  D(PY)  +  o2(C-P)VP'  +  a2PV(C-P)\  (2.5.1) 

Since  PVZ  *  0  — >  PV  »  AX'  for  some  A,  we  have 

(C-P)VP'  -  (C-P)XA  -  0 

using  the  conditions  CX  ■  X  and  PX  ■  X.  Thus  from  (2.5.1) 

D(CY)  -  D(PY)  +  D(CY-PY) 
which  proves  the  Proposition  2.6. 

Proposition  2.6  answers  a  question  raised  by  Kempthorne  (1976)  on  the 
construction  of  a  projection  operator  when  V  is  singular,  and  provides  a 
general  method  for  coordinate  free  estimation  through  the  concept  of  a  pro¬ 
jection  operator. 

From  the  Proposition  2.6,  we  have 

Proposition  2.7  Let  P  be  the  projection  operator  on  R(X)  along  R(VZ)  and 
(X'X)  be  any  g-inverse  of  X'X.  Then 

(1)  p'S  is  the  H9LUE  of  p'6.  peR(X'),  where 

B  -  (X'Xrx'PY 
2 

(ii)  An  unbiased  estimator  of  a  is 

f”XY*  (I-P’)V*'(I-P)Y,  f  -  p(V:X)  -  p(X). 

Reference  may  also  be  made  to  Example  4,  Rao  (1973,  p.  309),  where  an 
approach  to  linear  estimation  is  given  without  appealing  to  concepts  of 
linearity,  unbiasedness  and  minimum  variance.  This  is  similar  to  the  methods 
disc  •*••4  ip  action  4  of  this  paper. 
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3.  Robustness  In  the  Linear  Madel 

In  this  Section  we  will  discuss  robustness  of  some  .statistical  proce¬ 
dures  in  linear  models.  To  be  specific,  we  will  be  concerned  with  the 
robustness  of  best  linear  unbiased  estimators  (BLUEs)  in  the  context  of 
estimation, and  likelihood  ratio  tests  in  the  context  of  tests  of  hypotheses 
when  there  is  specification  error  in  the  design  matrix  and/or  in  the  disper¬ 
sion  matrix.  The  consequences  of  deviations  from  the  assumption  of  normality 
on  tests  will  also  be  discussed. 

2 

We  assume  the  same  set  up  as  in  Section  1.  Let  (Y,XB,o  I)  be  the 

2 

assumed  model  while  (Y,XB,o  V)  be  the  correct  model,  resulting  in  specifi¬ 
cation  error  in  the  dispersion  matrix.  Throughout  thiS'->See&ion  we  assume  that 
V  is  p.d.  Then  the  BLUE  of  an  estimable  linear  parametric  function  A&  is 

the  same  under  both  the  models  if  and  only  if 

A(X,X)'X,VZ«0  for  all  Z,  Z'X-O.  (3.1) 

This  follows  from  the  condition  that  a  BLUE  must  have  zero  covariance  with 
every  error  function.  Characterization  of  matrices  V  satisfying  (3.1)  is 
well  known  [Rao  and  Mitra  (1971);  Rao  (1967);  Zysklnd  (1967)].  Generally, 
(3.1)  is  equivalent  to  the  following  representation  of  V: 

V-  I  +  XA.X'  +  ZA„Z'  +  XA.Z’  +ZA;x’  (3.2) 

12  4  4 

where  A^,  A^  and  A^  are  arbitrary  except  that  AA^Z'  »  0  and  V  is  p.d.  An 
equivalent  representation  of  V  is  the  following: 

V-  I  +  XAjX'  +ZA2Z'  +X0A3Z'  +  ZA^  (3.3) 

where  Xq  *  X(I  -  A~A) ,  A^,  A2  and  A^  are  arbitrary  except  that  V  is  p.d. 

Some  further  necessary  and  sufficient  conditions  (i.e.,  equivalent  conditions) 
for  the  representati  n  (3.3)  to  hold  are  given  in  the  following. 


Proposition  3.1.  The  representation  (3.3)  is  equivalent  to  any  one  of  the 
following  conditions: 

(a)  Z'VZ1  -  0 

(b)  PXV~1(I-PX  y-i)  is  symmetric 

6 

(c)  (I  -  px  y_i> (I  -  Px  y-i)  is  symmetric 

G  * 

(d)  There  exists  an  orthogonal  matrix  T  such  that  T*(I-PX)T, 

T’(I-P  )T,  T,V"1(I-PX  V_X)T  and  T'(I-P  V_L)T  are  diagonal 
x0  .  *  u 

matrices. 


In  the  above  A:  kxm  with  p(A)  *  k,  Z^:  n*k  is  such  that  and  Z^Z *  0 

(k*n-p).  For  a  proof  of  the  above  proposition,  see  Mathew  and  Bhimasankaram  (1982). 
Incidentally,  JLf  we  demand  (3.1)  tp  held  for  all  A  such  that  R(A')  c  R(X')f 
which  means  that  for  every  estimable  linear  parametric  function  the  BLUE 

is  the  same  under  both  the  models,  then  we  get  the  following. 

Proposition  3.2.  (2.1)  holds  for  all  A  such  that  R(A’) c  R(X')  under  any 

one  of  the  following  equivalent  conditions: 

(a)  X'VZ  -  0 

(b)  VX  -  XQ  for  some  Q 

(c)  VPX  is  symmetric 

(d)  PT  is  symmetric. 

The  result  (a)  which  implies  (b)  was  proved  by  Rao  (1967),  and  (c)  is  due  to 

Zysklnd  (1967).  The  result  (d)  appears  in  Mathew  and  Bhimansankaram  (1982). 

Consider  next  the  problem  of  testing  H^:  AS  ■  0  assuming  normality  of 

the  errors.  It  is  well  known  that  the  F-test  based  on  X(X,I)  -  Y’(I-PX)I/ 

Y'(I-P  )Y  is  both  LRT  and  UMPI  (under  a  suitable  group  of  transformations) 

*0 
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under  the  normal  model  N(Y, XB,o  I)  (see,  for  example,  Lahmann  (1959)).  We 
would  like  to  study  the  robustness  properties  of  this  test  in  so  far  as 
whether  the  properties  of  Its  being  LRT  (criterion  robustness)  and  UMP1 
(inference  robustness)  remain  valid  under  deviations  from  the  assumption 
of  normality  and  the  presence  of  specification  errors  in  the  design  matrix 
and/or  the  disperison  matrix. 

To  begin  with,  note  that  if  the  correct  model  is  N(Y, X^B.o^V) ,  denoting 
by  X^  the  matrix  X^I-a'a),  the  LRT  testing  HQ:  AB  »  0  is  based  on 

X(X1,V)  -  Y'V^d-P  Q  _1)Y/Y'V“1(I-P  _j)Y.  (3.4) 


Therefore,  the  F-test  based  on  X(X,I)  under  N(Y,XB,a  I)  is  LRT  under 


N(Y,X1B,o  2v)  if  and  only  if 


X(X,I)  S  X(X1,V)  for  all  Y 


(3.5) 


Under  the  same  design  matrix  X^  -  X  but  a  different  dispersion  matrix,  the 
condition  on  the  representation  of  V  is  the  following: 


V  -  I  +  XAjX'  +  (s-l)ZZ'  +  XqA3Z’  +  ZA^Xq 


(3.6) 


where  A^,  A^  are  arbitrary  and  s  is  an  arbitrary  positive  real  number  subject 
to  (i)  V  is  p.d.  and  (ii)  ZjXA^X'Z^  ■  (s-l)I^.  The  following  proposition 
provides  other  equivalent  conditions. 

Proposition  3.3.  V  has  the  representation  (3.6)  if  and  only  if  any  one  of 
the  following  equivalent  conditions  holds: 


(a)  ( I— Pv  )V(I-P  )  -  a(I-P  )  for  some  a  >  0 

0  0  0 

(b)  V  ^(I-PY  j)  -  a( I-P  )  for  some  a  >  0 

0  *0 

(c)  (*  PX)  (V-aI)(I-PY:  PL')  •  0,  for  some  a  >  0,  with  A"  LX. 

LP  ax 

LiX 
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Part  (c)  of  this  proposition  is  due  to  Khatri  (1980)  andparts  (a),  (b)  are 

due  to  Mathew  andBhimasankaram  (1982).  The  representation  (3.6)  is  due  to 

Rao  (1967).  When  V  has  the  intraclass  covariance  structure,  V  -  (l-p)I  +  pll', 

proceeding  directly  Ghosh  and  Sinha  (1980)  noted  that  \(X,I)  =  A(X,V)  if  and 

only  if  1  €  R(Xn) .  Some  generalizations  of  (3.2)  and  (3.6)  are  reported  In  Chikuse 
(1981).  ' 

Under  the  same  dispersion  matrix  I  but  a  different  design  matrix  X^, 
the  F-test  remains  LRT  if  and  only  if  \(X,I)  =  A(X^,I).  This  leads  to 
the  following. 

Proposition  3.4.  \(X,I)  i  A(X^,I)  for  all  Y  if  and  only  if 

R(X)  -  R(XX)  and  R(XQ)  -  R(xJ)  (3.7) 

Finally,  the  following  proposition  provides  conditions  under  which 
(3.5)  holds  for  arbitrary  V  and  X^. 

Proposition  3.5.  (3.5)  holds  if  and  only  if  (3.6)  and  (3.7)  hold. 

Propositions  (3.4)  and  (3.5)  are  due  to  Mathew  and  Bhimasankaram  (1982). 

The  key  to  all  these  results,  noted  earlier  by  Sinha  and  Mukhopadhyay  (1980),  , 

can  be  stated  in  the  following  most  general  form  with  a  different  simpler 

proof  due  to  MUller  and  Sinha. 

Lemma  3.6:  Let  A,B,C,D  symmetricbe  such  that 

for  almost  all  y. 

y  By  y’Dy 

If  there  is  an  x  such  that  Ax  -  0  and  x'Bxi^O  then 

C  ■  yA  for  some  y  e  ®. 

Also 

D  -  yB 

provided  A  +  0. 
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i 


Proof :  From  the  assumption  it  follows  immediately  that 


y'Ay  y'Dy  ■  y'Cy  y'By  for  all  y. 


Especially  for  y*x  this  results  in  x'Cx-0.  Now  insert  (y  +  Xx)  to  obtain 


y'Ay[y'Dy+2X  y'Dx  +  X2  x'Dx] 


■  [y'Cy  +  2X  y'Cx]  (y'By  +  2X  y'Bx+X  x’Bx] 


2 


Comparison  of  the  coefficients  of  X  yields 


0  ■  y'Cx  (x'Bx)  for  all  y. 


& 


whence  Cx-0.  Therefore  the  coefficients  of  X  become 


y'Ay  x'Dx  -  y'Cy  x'Bx 


from  which 


x'Dx 

C  -  ~~  A 
x'Bx 


f ollow8 •  The  remainder  is  evident. 

We  now  turn  our  attention  to  the  robustness  properties  of  the  F-test 
for  non-normal  errors.  The  following  result  was  proved  by  Ghosh  and  Sinha 
(1980). 

Proposition  3.7.  Let  Y  -  XB  +  oe  with  e  distributed  according  to  a  density 
f(«)  given  by 


n/2 


f<«>  -  f  S— -  xw‘  dL(T) 

-  J  (^T)n 

Then  for  testing  AS»0,  the  F-test  based  on  X(X,I)  is  both  LRT  and  UMP1. 


Recently  Slnha  and  Drygas  (1982)  generalized  this  result  to  the  following. 


Proposition  3.8.  Let  Y  ■  XB + oe  with  €  distributed  according  to  a  density 
q(e'e),  q  -j- »  convex.  Then  the  F-test  baaed  on  X(X,I)  is  both  LRT  and  UMPI. 
This  result  is  similar  to  a  robustness  property  of  the  Hotelling's 

2 

T  -test  proved  by  Karlya  (1981)  and  is  based  on  an  application  of  a  represen¬ 
tation  theorem  due  to  Wijsman  (1967).  Under  a  slightly  more  general  dis¬ 
tribution  of  the  errors,  the  following  property  of  a  BLUE  holds  (see  Slnha  and 
Drygas  (1982)). 

Proposition  3.9.  Let  Y  ■  XB  +  oe  with  c  having  a  spherically  symmetric  dis¬ 
tribution.  Then  for  any  c c  H and  any  n.n.d.  matrix  C  of  appropriate  order 

P{  (Gy-AB)  'C(Gy-AB)  i.c2}  >^P{  (Ly-AB)  'C(Ly-AB)  ^c2} 

where  Gy  is  any  BLUE  of  estimable  AB  and  Ly  is  any  unbiased  estimator  of 
AB. 

4.  Sufficiency  and  Completeness  In  the  Linear  Model 

The  well-tried  principle  of  sufficiency  has  features  some  of  which 
give  rise  to  a  similar  concept  in  the  linear  model  when  no  distributional 
assumptions  are  made.  Suppose,  for  instance,  s  is  a  sufficient  statistic 
for  some  parameter  8  and  t  is  independent  of  s.  In  this  case  the  expected 
value  of  any  lntegrable  function  h(t)  can  be  written  as 

EQ  h(t)  -  E(h(t) | s)  -  *(s)  a.s. 

which  is  a  function  Independent  of  9.  Note  that  ♦  (»)  must  be  constant,  i.e., 
t  is  ancillary,  if  all  underlying  distributions  share  the  same  null  sets 
(cf.  Basu  (1958)).  It  might  have  been  the  above  equation  that  led  Barnard 
(1963)  to  his  notion  of  linear  sufficiency.  Adjusted  to  our  model 
(Y,X8,o2V)  it  is  as  follows. 


Definition  ^.1:  A  linear  statistic  Ly  is  called  linearly  sufficient  if 
for  all  linear  functions  c'y  uncorrelated  with  Ly  there  is  a  b  such  that 
Ig(c'y)  -  b'Ly  a.s. 

If  V  is  regular  this  simply  Mans  Chat  Che  expected  value  of  c'y  does 
not  depend  on  8.  Another  approach  to  the  idea  of  linear  sufficiency  arises 
from  the  fact  chat  uniformly  minimum  variance  unbiased  estimators  are  functions 
of  each  sufficient  statistic.  According  to  this  is  a  definition  of  Baksalary  and 
Kala  (1981),  although  they  originally  used  a  different  terminology. 

Definition  4.2:  A  linear  statistic  Ly  is  called  linearly  sufficient  if  for 
each  linear  estimable  function  p'fi  the  BLUE  la  a  linear  function  b'Ly  of  Ly. 

On  the  other  hand  one  may  consider  that  the  best  prediction  of  y  given  any 
statistic  s  is  the  conditional  expectation  E0(y|a),  which  id  independent  of 

9  if  s  is  sufficient.  Reduced  to  linear  terms  this  property  results  in  a 
definition  that  is  due  to  Drygas  (1983). 

Definition  A .3:  A  linear  statistic  Ly  is  called  linearly  sufficient  if  the 
best  linear  predictor  of  y  given  Ly  (written  BLP(yjLy)  is  Independent  of 
8. 

If  the  distribution  of  y  has  a  density  p0  then,  under  certain  regularity 
conditions,  Fisher's  information  matrix  I  is  well  defined.  In  this  case 
a  statistic  s  is  sufficient  if  and  only  if  its  information  Mtrlx  I#  equals 
the  original  I.  In  the  linear  model  the  assumptions  above  are  met  when  y 
is  normally  distributed  and  R(X)  is  contained  in  R(V) .  Then  the  information 
matrix  for  the  parameter  8  reads 

I  -  X'  V"  X  . 
o * 

This  may  be  regarded  as  an  information  Masure  as  well  without  the  normal 
supposition.  One  can  define  therefore: 

Definition  4. A:  If  R(X)  c  R(V)  a  linear  statistic  Ly  is  called  linearly  suffi¬ 
cient  if  IL  •  I  . 


Each  of  Chase  definitions  can  be  transformed  into  algebraic  terms,  which 


all  turn  out  to  be  equivalent.  We  present  two  of  the  handier  ones. 

Proposition  4.5.  Ly  is  linearly  sufficient  if  and  only  if  R(X)cR(WL')  or 
ker  L  n  R(W)  c  V(ker  X').  (See  Baksalary  and  Kala  (1981),  MUller  (1982)). 

If  y  is  normal  with  known  variance  and.  in  addition,  R(X)  is  a  subspace 
of  R(V)  then  it  follows  immediately  from  Definition  3.4  that  sufficiency  and 
linear  sufficiency  are  equivalent  notions.  This  attractive  property  can  like¬ 
wise  be  confirmed  without  the  regularity  condition  as  was  shown  by  Drygas  (1983) 
and  MUller  (1982). 

Proposition  4.6.  If  y  is  normal  with  known  variance  then  Ly  is  linearly  suf¬ 
ficient  If  and  only  If  It  Is  sufficient. 

But  the  concept  of  linear  sufficiency  also  makes  some  sense  without  the 
normal  supposition  as  it  becomes  evident  from  Definition  4  .2  and  from  the 
following  formulation  which  might  be  called  a  linear  version  of  the  Rao- 
Blackwell  theorem  (see  Rao,(1973)):  Let  Ly  be  linearly  sufficient  and  a' 8 
be  any  parametric  function  estimated  by  c'y,  say.  Then  BLP(c’y|Ly)  has  the 
same  bias  as  c'y  but  smaller  mean  squared  error.  That  means  not  only  BLUEs 
but  all  admissible  linear  estimators  are  linear  functions  of  Ly.  (As  for 
admissibility  see  Rao  (1976).) 

Sufficient  statistics  are  especially  useful  when  they  are  complete. 

The  linear  analogue  of  this  concept  arises  quite  naturally. 

Definition  4.7:  A  linear  statistic  Ly  is  called  linearly  complete  if  each 
linear  function  a'Ly  that  has  expected  value  0  for  all  8  «  ll"  vanishes  a.s. 

Again  the  definition  can  easily- be  translated  into  an  algebraic  ex¬ 
pression.  Combined  with  this  the  above  conditions  for  linear  sufficiency 
turn  from  inclusions  into  equations.  For  normal  variables  Drygas  (1983) 
showed  the  accordance  with  ordinary  completeness. 

Proposition  4.8.  Ly  is  linearly  complete  if  and  only  if  R(LV)  c  R(LX) . 


16 

If  y  i s  normally  distributed  this  is  equivalent  to  completeness.  Ly  i, 
linearly  sufficient  and  linearly  complete,  i.e.  sufficient  and  complete  In 
the  normal  case  with  known  variance,  if  and  only  if  R(X)  »  R(VL')  or 
kerLnR(W)  -  V(kerX'). 

Generally  a  linearly  sufficient  Ly  does  not  provide  all  the  information 
2 

about  o  contained  in  the  sample.  This  deficiency  can  be  compensated  when, 
in  addition  to  Ly,  one  or  more  quadratic  forms  are  considered.  One  would 
like  to  extend  now  the  idea  of  linear  sufficiency  to  this  situation,  but 
among  the  four  definitions  above  only  Definition  4.2  can  serve  this  purpose 
satisfactorily. 

Definition  4.9:  (Ly,  T'Ty)  is  called  quadratically  sufficient  if  Ly  is 
linearly  sufficient  and  the  residual  sum  of  squares  can  be  expressed  as 

y'L'ALy  +  a  y'Ty  for  some  symmetric  A  end  real  a. 

Note  that  the  residual  sum  of  squares  is  a  minimum  variance  unbiased  estimator 

of  a2  *  (degrees  of  freedom)  if  y  is  normal.  Things  become  rather  more  compli¬ 
cated  when  one  allows  for  more  than  one  quadratic  form  while  V  is  singular. 

With  the  above  definition,  however,  the  following  can  be  proved  (cf.  Seely 
(1978),  MUller  (1982)). 

Proposition  4.10.  (a)  (Ly,  y'Ty)  is  quadratically  sufficient  if  and  only  if 

for  some  a«  K.ker  L  n  R(W)  c  V(ker  X')  n  ker  X'T  n  ker  (I-aVT) . 

(b)  If  y  is  normal  a  quadratically  sufficient  (Ly,y'Ty)  is  sufficient.  It 
is  complete  if  Ly  is  complete. 

(c)  If  y  is  normal  a  sufficient  (Ly.y'Ty)  is  quadratically  sufficient  pro¬ 
vided  one  of  the  following  two  conditions  holds. 

(i)  y'Ty  >  0  a.s.  (i.e.  WTW  is  positive  semidefinite) . 

(ii)  y'Ty  is  invariant  a.s.  (i.e.  WTX  ■  0)  and  Ly  is  complete 
(i.e.  R(LV)  c  R(LX) ) . 
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